Deeper insights into long-term survival heterogeneity of pancreatic ductal adenocarcinoma (PDAC) patients using integrative individual- and group-level transcriptome network analyses

Pancreatic ductal adenocarcinoma (PDAC) is categorized as the leading cause of cancer mortality worldwide. However, its predictive markers for long-term survival are not well known. It is interesting to delineate individual-specific perturbed genes when comparing long-term (LT) and short-term (ST) PDAC survivors and integrate individual- and group-based transcriptome profiling. Using a discovery cohort of 19 PDAC patients from CHU-Liège (Belgium), we first performed differential gene expression analysis comparing LT to ST survivor. Second, we adopted systems biology approaches to obtain clinically relevant gene modules. Third, we created individual-specific perturbation profiles. Furthermore, we used Degree-Aware disease gene prioritizing (DADA) method to develop PDAC disease modules; Network-based Integration of Multi-omics Data (NetICS) to integrate group-based and individual-specific perturbed genes in relation to PDAC LT survival. We identified 173 differentially expressed genes (DEGs) in ST and LT survivors and five modules (including 38 DEGs) showing associations to clinical traits. Validation of DEGs in the molecular lab suggested a role of REG4 and TSPAN8 in PDAC survival. Via NetICS and DADA, we identified various known oncogenes such as CUL1 and TGFB1. Our proposed analytic workflow shows the advantages of combining clinical and omics data as well as individual- and group-level transcriptome profiling.

Pancreatic ductal adenocarcinoma (PDAC) accounts for 90% of pancreatic tumors and 10% of gastrointestinal cancers 1 . It is the 4th leading cause of cancer-related death worldwide while remaining the most lethal among digestive cancers 2 with only few treatment therapies 3,4 . PDAC has a complex and dense tumor microenvironment that poses a significant barrier to treatment administration 5 . In general, various factors shape the outcome for complex diseases leading to perturbations of a complex intracellular network 6 . Disease-relevant genes typically do not operate on their own 7 . Network approaches that naturally acknowledge interactions and allow integration with regulatory factors are thus required to map phenotypic variability of complex diseases, including PDAC fully.
For PDAC, various studies have shown the influence of lymph node, lymphovascular and perineural invasion, surgical resection margin, chemotherapy 8-10 on prognosis. The overall survival of patients may also be coupled to the mutational status of Kirsten rat sarcoma viral oncogene (KRAS) as well as several morphological www.nature.com/scientificreports/ to methods section-Group based DEGs analysis). Sixteen genes contained an Ig domain, followed by a V-set domain. Based on the clusterprofiler based enrichment analysis of DEGs against the interpro domains, we identified three significant enrchied domains (Cytosolic fatty-acid binding, Intracellular lipid binding protein and Glycoprotein hormone subunit beta) under threshold of p.adjusted < 0.05. Fifty three prognostic genes (p-value < 0.05) were identified from all DEGs (Table S4). We observed 22 DEG genes containing at least one domain that overlapped with the survival gene set (Fig. 2C). GKN1 was found as part of oncogenic signatures (ATM_DN.V1_UP: c6 MSigDB dataset). Also, GKN1 consisted of BRICHOS domain, found in a variety of proteins. We furthermore identified two genes HIST1H1T, and SOX10 (disease-associated), consisting of the linker histone protein domain and Sox_HMG box, respectively, probably implying these genes' regulatory role in PDAC survival mechanisms. Another gene, miR-765, showed a significant increase in survival in long-term patients with lower expression compared to ST (Fig. S2).These results highlight the potential of the identified genes in further understanding molecular underpinnings of PDAC survival. RT-qPCR confirmed the differential expression observed in LT versus ST for the genes represented in Fig. S2A. Among them, the DEGs REG4 and TSPAN8 were validated in the lab via RT-qPCR analysis (Fig. S3). In addition, several DEGs from studied cohort such as CRISP3, PCK1 Group-level survival heterogeneity: gene co-expression modules significantly associated with clinical traits and their corresponding 3D architectures. All 19 samples with clinical information and gene expression data were included in WGCNA (refers to methods section-Group-level survival heterogeneity). Genes with similar expressions were grouped into gene modules via average linkage hierarchal clustering. In this study, power of β = 14 (scale free r 2 > 0.85) was selected as soft thresholding to ensure the scale free topology. By use of a dynamic tree-cutting algorithm, a total of 96 distinct co-expression modules were identified. Correlated modules were merged with a cut-off height of 0. 25 (Fig. S4). Clinically relevant significant modules were imported into Cytoscape, and gene-gene interactome network were developed for M34 module (Fig. 3B). Module M9 was found to be significantly associated with tumor size (r 2 = 0.72, adjusted p-value = 0.01) and T stage (r 2 = 0.68, adjusted p-value = 0.03). M9 consisted of the highest number of DEGs (27 genes). Two other modules, M7 (r 2 = 0.73, adjusted p-value = 0.01) and M30 (r 2 = 0.71, adjusted p-value = 0.02), were negatively associated with time between surgery and chemotherapy clinical traits. M30 contained 10 DEGs. Module M34 was significantly associated with tumor size by imagery (r 2 = 0.67, adjusted p-value < 0.05). Interestingly, two modules were significantly associated with chemotherapy: a positive association for M15 (r 2 = 0.68, adjusted p-value = 0.04) and a negative association for M9 (r 2 = −0.68, adjusted p-value < 0.04). The overlap between DEGs and genes in five modules (M7, M9, M15, M30, M34) is shown in a Venn-Diagram (Fig. 3C), from which we can identify 27, 10, and 1 gene as part of M9, M30, and M34, respectively. are shown as mixed bar and heat map plot. P1 to P13 refer to patient specific clinical traits analyzed in this study (selective data has been shown in plot; full details given in Table S1). Tumor stage (from 1 to 4). OS (in months), tumor size by imagery (in mm) and Time between imagery and surgery are indicated in the graph. OS clinical trait denotes overall survival and was used for the development of the Kaplan-Meier survival curves for short-term (ST) and long-term (LT) PDAC Survivors (ST: S1 to S10; LT: L1 to L9); (B) Identification of significant gene ontology of associated up and down-regulated DEGs and their relevant functions. Up and down-regulated genes are highlighted with red and green dots, respectively. The size of data points increases with increased significance (uncorrected for multiple testing); (C) Venn-diagram showing the number of identified genes that are common to or different in multiple first-line analysis strategies (CDD: conserved domain analysis, DGE: differential gene expression analysis, SA: survival analysis. www.nature.com/scientificreports/ Group-level survival heterogeneity: functional analysis of clinically relevant gene co-expression modules. Clinically relevant gene modules (i.e., modules identified by WGCNA as significantly associated with clinical traits) were functionally followed up in Cytoscape with the ClueGO plug-in (Group-level and Individual-specific analyses) that visualizes large clusters of genes in a functionally grouped network. Module M9 was linked to 33 significant pathways (multiple testing adjusted p-value < 0.05) distributed over ten groups, such as extracellular matrix organization (86 genes) and collagen formation (37 genes) (data not shown). Genes regulating the cell cycle and modulating extracellular matrix at molecular or cellular levels have been linked to cancer drug targeting and cancer cell plasticity 32 . M7, also negatively associated with chemotherapy, contained 91 significant pathways, distributed into three groups, such as proteasome (4 genes) and the regulation of RAS by GAPs (5 genes) (Fig. 4A). Module M15, positively associated with 'chemotherapy' , was enriched with 11 significant pathways distributed into five groups, such as the inositol phosphate metabolism (3 genes) and muscle contraction (9 genes) (Fig. 4B). In module M34, we found three significant Reactome pathways distributed into three groups: the effects of PIP2 hydrolysis (4 genes), the deactivation of the beta-catenin transactivating complex (3 genes) and the VEGFA-VEGFR2 pathway (4 genes) (data not shown). In M30, we found two significant pathways: apoptotic cleavage of cell adhesion proteins (4 genes) and o-linked glycosylation (  Individual-specific survival heterogeneity: functional pathway and domain analysis in long-term PDAC survivors. We furthermore examined the extent to which the individual patterns in LT survivors reflected disruptions in KEGG and Reactome pathways and identified multiple pathways that were significantly enriched in at least one LT individual (Table S5). In-depth analysis revealed that 17 pathways (out of 192) were common to at least two LT survivors (Fig. 5B). Thus, 175 pathways were uniquely perturbed in an LT PDAC survivor (i.e. not shared among LT survivors). Individuals (LT1, LT3, LT4, LT5, LT6) did not show significant enrichment in any KEGG/Reactome pathway. Based on the presence/absence of enriched pathways across LT survivors (LT2, LT7, LT8, LT9), two-way hierarchical clustering revealed three clusters (Fig. S8). First two clusters (C1 and C2) showed enriched pathways in two LT only. C1 consisted of 14 pathways was collectively enriched in L7 and L9 and highlighted a strong association with cancer-related pathways. C2 showed enrichment of 13 pathways between L9 and L2, such as Proteoglycans in cancer and EPH-Ephrin signaling. Smallest cluster, C3, consisted of 8 pathways across three LT survivors, i.e. LT2, LT7, LT9. Deeper hierarchical clustering groups C2 and C3 into single supercluster based on similar pathways profiles. www.nature.com/scientificreports/ A primary goal of molecular biology is to determine the mechanisms that control gene transcription. Specific domain structures of genes play a significant role in gene regulation and expression. Hence, we also investigated the domain structures of perturbed genes in PEEPs of LT to understand their potential regulatory mechanism in LT survival. A total of 47 enriched domains (adjusted p-value < 0.05) were identified (Table S6). Two-way hierarchical clustering (biclustering) based on motif enrichment profiles (present or absent) across all LT survivors resulted in four clusters (Fig. S8). The first cluster (C1), represented by LT7 and LT9, was enriched with six domains. The second cluster (C2), active in LT2 and LT7, was enriched with seven domains. The third clusters (C3) involved enrichment of 7 domains shared two among LT survivors (Table S6). The fourth cluster (C4) was largely shared by three LT survivors (LT2, LT7, and LT9). This cluster involved five domains: IPR013032, PS01186, IPR000742, PS00022, and IPR009030. Deeper hierarchical clustering groups C1 and C4 into single supercluster based on similar protein domains profiles.
In addition, for each LT survivor, we constructed two hierarchal trees based on the genes potentially involved in multiple domains and pathways, one for each for LT survivor. More in-depth analysis revealed a common gene set between cluster 24 obtained from gene-level clustering and cluster 1 (C1) derived from pathway-level biclustering (Figs. S8, S9 and S10). Similarly, cluster 25 derived from gene level analysis showed overlap with cluster 2 (C2) derived from pathway-level biclustering.
Exploitation of gene connectivity: systems views. Gene connectivity via reference networks can further highlight interesting gene clusters linked to LT survivors. In a first approach, we developed a disease module via DADA 7,33 . The latter uses the human protein interactome network structure to prioritize disease genes while also removing possible biases induced by gene degree distributions (refers to methods section-Individualspecific survival heterogeneity). The disease module hypothesis proposes that disease regulatory genes form one or a few large connected components in a human interactome. In this study, we restricted our seed genes (i.e., genes that play significant roles in PDAC according to the prior biological knowledge) to PDAC survival (SMAD4, CDKA2, and KRAS) and PDAC responsiveness based on a literature search and as identified from the DisGeNET database 34 (Table S7). Only the top 1% of DADA ranked genes were retained (Figs. 6A I-IV; 1J), leading to 70 genes. Only one DADA top gene was also previously identified as DEG (DKK4), as shown in (Fig. 6C). We also looked at the overlap between DADA-based 1% top-ranked genes and perturbed genes as highlighted Figure 5. Genomic distributions of differentially expressed genes (DEGs) and PEEPs related to PDAC survivors using Circos plots and functional profiles of perturbation data: (A) first (grey) and second outermost circle labeled with numbers represent chromosomes (multiple colors); the third outermost track represents DEGs (red and green indicating, respectively up-regulated and down-regulated DEGs as scattered points); the fourth outermost circle represents genomic locations of genes associated with survival (purple lines); the nine innermost circles (highlighted in orange) refer to the z-score for each LT survivor (ranging from LT1 to LT9); (B) Enriched KEGG pathways (P1 to P19 (out of 196)) shown via Circos Table Viewer. Each link refers to an LT survivor and a significantly enriched pathway (adjusted p-value < 0.05) based on the perturbed gene set found in that individual (data for LT2, LT7 and LT9 are shown). Uniquely enriched pathways across LT survivors are given in Table S2. www.nature.com/scientificreports/ by the PEEPs of individuals belonging to the long-term survival PDAC patient group (Fig. 1L). There were 23 genes in total. None of these common genes had previously been identified as DEGs. Out of 23, we identified 7 DADA top-ranked genes in common to clinical gene modules as identified before (Fig. 6C, 1K; Table S6). Only a single gene was shared by at least (actually exactly) three LT subjects, namely GLI2. Three genes (RAC1, FOSL1, and EGF) were shared by two out of 9 LT survivor PEEPs. Furthermore, three genes (JAG2, TGFA, HDAC1) were uniquely perturbed in a LT survivor (Fig. 6B). We integrated individual-specific gene perturbation information (from PEEPs) with group-level DEG findings. For this, we used NetICS, which further allows unraveling inter-and intra-patient gene expression heterogeneity (Individual-specific survival heterogeneity; Fig. 1E). Also, in this approach, a ranked list of genes was generated. The ranks are based on the gene scores acquired through network diffusion algorithms (Fig. 7A, Table S8). Similar to the DADA approach, we focused on the top 1% of ranked genes for each LT survival patient, leading to 500 genes. Those 500 genes constituted a subset of PEEP genes. Only 13 genes out of 500 were also DEGs, including 6 genes that were additionally linked to clinical disease modules (Fig. 7A). Among these 13 DEGs, TNNI3 was NetICS top ranked, and was shared in its significance by 3 out of 9 LT survivors. It was also associated with the M7 module of clinical relevance (Figs. 7A, 1G; Table S8). Notably, NOSTRIN, a unique to NetICS gene (i.e., not highlighted by any other method shown in Fig. 7A) was common to 7 out of 9 LT subjects. Furthermore, we found 14 genes common to DADA and NetICS gene prioritization methodologies (Figs. 7B, 1J; S11; Table S9), involving the pathways such GPCR, Notch signaling pathway and many others. This common gene set did not include TNNI3 nor NOSTRIN. The percentage of LT PEEP genes not included in the top 1% DADA gene list is 27% (384/1440) and is similar to the percentage of LT PEEP genes not included in the top 1% NetICS gene list (263/963).

Discussion
Identifying molecular PDAC cancer drivers is critical for implementing precision medicine in clinical practice. Typically, the optimization and fine tuning of gene prioritization methods require large datasets 35 . Despite the small sample size of this study, we identified genes showing associations with multiple clinical traits 36 and derived plausible links between long-term survival of patients and genes, pathways and protein domains by exploiting multiple approaches, including the combination of individual-level with group-level information in integrated analysis workflows. Throughout the entire study, we have relied on several statistical techniques and approaches to determine statistical significance with small samples (including non-parametric tests and empirically derived p-values). www.nature.com/scientificreports/ PDAC accounts for over 90% of pancreatic cancer and is a lethal malignancy with very high mortality rates. The gene regulatory landscape of PDAC is defined by four mutational "mountains" (KRAS, TP53, CDKN2A, SMAD4), which are the main drivers of PDAC 37 . Thus, cancer diseases are heterogeneous at different scales: group level, individual level, tumor type, cell level. This study reports on PDAC gene expression differences in patients who survived longer than 36 months (LT) or less than 12 months (ST). Via advanced genomic profiling of PDAC survivors, we aimed to obtain more insights into LTS-relevant biological mechanisms that contribute to PDAC heterogeneity.
In this work, we identified known PDAC driver genes associated with survival, including ROBO2, ZG16B, ANXA8, CEACAM5, CYP24A1, GPR87, GSDMC, KLK6, KRT14, KRT6A, MMP13, MUC16, S100A2, SER-PINB3, TRIM31, TSPAN8 and PLXNA1 [38][39][40] (Supplementary S2). Concordance result has been observed in the independent cohorts as well. In addition, a thorough investigation of gene expression differences between long-term and short-term PDAC survivors highlighted gene involvement in immune responses (CEACAM20, C6orf13, IRS4, CXCL17), cell cycle (SPDYE3, HLA-DQA2, CLDN) and metabolic pathways (GBA3, LIPN), further highlighting the importance of these pathways in PDAC disease sruvival 41,42 . Association of LT survivors with Immunogenic (A2) subtypes (Bailey et al. 2016) confirms the importance of identified immune specific pathways. These findings provide mounting evidence that differential expressed genes (FABP2 , IGKV1D-8, TFF1, TFF2) linked to immune responses could be useful in the development of effective therapies for PDAC survival 43  We also identified a downstream target of KRAS (MUC16) as DEG, supporting KRAS implications in survival 44 . Also, we observed modifications of GKN1, KRT6, and ANKRD43 gene expressions in LTS, known to induce apoptosis and a higher metastasis in other cancer type 45,46 . In addition, a previous study showed REG4 as a serological marker for PDAC 47 . Very little information exists, though, about the role of TSPAN8 in PDAC. However, TSPAN8 was recently shown to promote cancer cell stemness via activation of sonic Hedgehog signaling 48 . Validation of a selection of DEGs via experimental work confirmed a role of REG4 and TSPAN8 in PDAC survival mechanisms. These molecular lab results indicate the interplay between the procession of tumorigenesis in PDAC and whole-body metabolism 49 , which could be regulated individually or in combination with various factors in survival patients. The presence of multiple immunogenic domains (IGV, V-SET) in identified DEGs further supports recent activities towards immunological targets for cancer therapy 50 . This indicates in-depth investigation of immunity cycles in relation to long-term survival in PDAC patients.
Systems biology approaches can provide immediate functional insights by revealing interactions between genes and cross-talks between biological processes 51 . A motivation for WGCNA is that genes functioning together are regulated or co-expressed together 52 . Ballouz and cauthor 53 suggested a minimal of 20 samples to predict meaningful functional connectivity. This forced us to pool ST and LT together for WGCNA analysis on 19 patients and to link thus identified gene modules or their constituents to clinical traits with non-parametric  Additionally, multiple targets have been identified in the form of DEGs being associated with numerous traits such as tumor size and the time between surgery and chemotherapy. In our study, we identified several clinically relevant WGCNA gene modules (e.g., a gene module associated to time between surgery and chemotherapy with DEGs LYZ, DKK4, CA14, NASE7, TSPAN8, GKN1, GKN2, SNORD116-18, DKK4), which warrants further exploration on increased sample sizes in the future. Notably, TSPAN8 serves as a prognostic marker in other cancer types as well 48 . Apart from time between surgery and chemotherapy, time to surgery may play an important role in PDAC that has been associated with an increase in tumor size 55 . DEG DKK4 (also top 1% DADA gene) is the least studied protein from the Dickkopf (DKK) family, which includes DKK3 56 and DKK1 56 . The fact that DKK4 did not appear in NetICS's prioritization gene list, nor in PEEPs of LTS, suggests that DKK4 may be more promising in controlling the survival of patients with PDAC rather than explaining individual heterogeneity among long-term PDAC survivors. Identfication of DKK4 as group based DEGs in TCGA cohort further confirm its role in PDAC survival.
The identification of prognostic factors is complicated in the presence of individual-to-individual heterogeneity 57 . Unique tumour biology may determine long-term survival in pancreatic cancer, and detailed individual-specific omics profiling may be required to provide novel insights into prognostication for this disease 58 . DEGs alone are unlikely to fully characterize individual (LT) survival, as observed for other complex traits 24 . Previous studies 26,26,59,60 emphasized the existence of subgrouping of PDAC patients in general, based on expression profiling of samples. Our study showed that any LTS patient only exhibits a small fraction of groupwise DEGs in their PEEP profiles and shows a deep level of gene expression heterogeneity. Notably, several genes were uniquely perturbed in an LT survivor, which strengthens our belief that LTS patients exhibit more abundant levels of heterogeneity. Heterogeneity has been observed in lung cancer at gene (genetic aberrations) and cellular level through high throughput techniques 61,62 . Careful inspection of PEEPs across LT survivors highlighted specific biological signatures associated with focal adhesion 63 , and extracellular matrix receptors 64 , which helps understand why these patients with PDAC survived longer. Furthermore, it is notable that multiple PDAC responsive pathways 65 were enriched across several LT survivors and, based on the perturbed gene sets, led to further subgrouping of LT survivors. Understanding these pathways may provide novel insight into the long-term survival mechanism in PDAC. PEEP analysis identified FCGR3A, a potential biomarker in PDAC 66 . Two genes, NOSTRIN and ADGRG6, were shared by 66% of LTS, and have been reported before to be associated with PDAC survival 52,67 . In independent datset cohort A, NOSTRIN gene was found to be shared in atleast two LT survivors.
Drugs bind to their target proteins and may ultimately perturb the transcriptome of a cancer cell 68 . Establishing a causal link between a gene and a disease outcome experimentally remains time-consuming 69 . In our study, analytic functional analysis of individual PEEPs helped to decode homogeneity patterns within LTS. Heterogeneity at the gene level may go hand in hand with homogeneity at the pathway level as different gene perturbations may lead to disruptions in the same molecular pathway. Network-centric approaches resulted in various oncogenes such as CUL1, a central component of SCF 70 , EGF, FOSL1 71 , MMP9 72 , and TGFB1 42 , already known as emerging attractive anticancer targets. Different transcription factors (GLI2 and GL3) were identified, linked to the KRAS mechanism of pancreatic tumorigenesis 73 . Identified Immunogenic gene (CDON) and epigenetic regulatory gene (HDAC1) targets could play significant roles in the future immunotherapeutic strategies in long-term PDAC survivors 58 . CD8 revealed in our study is in line with recent studies in which CD8 expression profiling was linked to an immunologic subtype of PDAC with favorable survival 74 . These results, despite the small sample sizes to work with, indicate the possible advantages of employing an integrative analysis pipeline, such as combining knowledge about network-driven disease modules with individual-specific gene perturbation profiling. Unlike DEG-oriented therapeutic target selection for cancers, commonly used to date, we promote the exploitation of analytic frameworks in which multiple network-centric approaches are used for the identification of patient-specific therapeutic targets. This will boost cancer prognosis and treatment in the context of personalized medicine.

Methods
Data collection and sequencing. Patient selection, ethical statement, and criteria to maximize the definition of STS and LTS. All aspects of the study comply with the Declaration of Helsinki. PDAC patients from Liege University Hospital were recruited based on an opt-out methodology from 2007 to 2014, giving to N = 96 pancreas tissue. All participants signed the written informed consent prior to the enrollment. The study was approved by the local institutional ethical board ("Comité d' éthique hospital-faculties universities de Liège) under the reference number B707201627153.
Tissues were obtained from the University of liege Biobank, Belgium. This work is a retrospective study. Between 2007 and 2014, 96 patients were admitted to the CHU Liège for pancreatic cancer. Among the 96 patients, only patients who went a tumour resection were selected to perform RNA sequencing on the tumour tissues. Next, two groups with different statuses of survival were selected: (1) 21 patients who have an overall survival comprised between 3 and 12 months after pancreas cancer diagnosis were selected as the short term survivor group; (2) 15 patients who survived more than 36 months after pancreas cancer diagnosis were selected as the long term survivor group. Patients who died three months after diagnosis or in the period between 12 and 36 months after diagnosis were not included in the study to potentially maximize the molecular differences between long-and short-term survivor groups. We performed RNA extraction from those 36 samples and processed for RNA quality check. The clinical description of patients, treatments and patient outcome is available in supplemental Table S1; Fig. 2A (overall survival curve). www.nature.com/scientificreports/ RNA extraction, library preparation, sequencing. Tumor areas were determined by a certified pathologist and were manually macro-dissected from the FFPE tissues. RNA was extracted using an All Prep DNA/RNA/ miRNA Universal kit (Qiagen, Belgium) according to the manufacturer's protocol. The RNA quality (N = 36) was assessed using a BioAnalyzer (Agilent, Belgium), and the proportion of RNA with a length higher than 200 bases (DV200) was measured. Only 19 out of 36 met a suitable RNA quality, allowing for sequencing. TruSeq® RNA Access Library Prep Kit (Cat. No. RS-301-2001 and RS-301-2002) (Illumina, The Netherlands) was used to prepare libraries, and next-generation sequencing was performed on a NextSeq500 apparatus (Illumina, The Netherlands), in paired-end 2 × 75 bp high output mode. We performed a series of transcriptome computational analyses to better understand patient heterogeneity between LT and ST survivors. After quality control and adaptor trimming with Trimmomatic 75 , sequence data were mapped to the Genome Reference Consortium GRCh38 assembly using STAR v2.5.2 76 . Read counts for known genes were generated using the function HTSeq-count v0.6.1p 77 and data were normalized in DESeq2 v1.20.0 78 as shown in Fig. 1. The study's analytic workflow is depicted in Fig. 1A-L. Clinical features of patients. Various clinical and pathological parameters of patients (N = 19) were included in the analysis. In particular, we collected the following pathological clinical data: age, sex, tumor size, number of lymph nodes evaluated, tumor grade, surgey magins invaded by tumor cells (during or after surgery, a pathologist examines rim of tissue called the surgical margin or margin of resection to be sure it contains no cancer cells), time between surgery and chemotherapy (in days), time between surgery and relapse (in months), disease-free survival (DFS), vascular resection, time in hospital after surgery (in days), re-hospitalization six months after surgery, vascular contact, artery contact, and chemotherapy as shown in Fig. 2A.

Group-level and Individual-specific analyses. Group based DEGs analysis: Differential Gene analysis
and functional follow-up. We used DEseq2 78 for the identification of differentially expressed genes (DEG), with the thresholds log2 fold change ≥ 2 and ≤ − 2, to indicate up-regulation and down-regulation, respectively (Fig. 1B). Significance was assessed at an unadjusted p-value < 0.05 in LT versus ST group comparison 79 . We used the ClusterProfiler v3.8.1 80 package to predict various GO processes enriched in differentially expressed genes (DEGs). To identify the protein domain in DEGs, we used batch CD-Search 81 . For deeper analysis, we downloaded the gene-specific InterPro domains from the Ensembl biomart (https:// www. ensem bl. org/ bioma rt/ martv iew) database. Further, we performed the enrichment analysis of InterPro domains with DEG in ClusterProfiler v3.8.1 package. Identified DEG was analyzed for detection of prognostic genes, with a log-rank test in a Kaplan-Meier survival model 82 (Fig. 1A,B). For each gene, patients were classified into two groups, the high-expression group (H), mid-expression group (M) and the low-expression group (L), using the expression median of the gene as a cutoff using the survminer 72 (v. 0.4.6) R package. Furthermore, to validate identified survival-associated DEGs, we downloaded the three datasets (cohort A: https:// www. ncbi. nlm. nih. gov/ geo/ query/ acc. cgi? acc= GSE17 891; cohort B: Notta et al. 2016 (https:// ega-archi ve. org/ datas ets/ EGAD0 00010 01956); cohort C: TCGA (https:// gdac. broad insti tute. org/). We extracted the processed data of three cohorts with the help of Bioconductor MetaGxPancreas R package (https:// bioco nduct or. org/ packa ges/ relea se/ data/ exper iment/ html/ MetaG xPanc reas. html). All the samples in three cohorts were classified into ST and LT with survival < 12 months and > 36 months and differential gene expression analysis was performed with limma (https:// bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ limma. html) with pvalue < 0.05. Identified DEGs from three cohorts were checked for overlap with significant genes identified from this study.
Group-level survival heterogeneity: WGCNA for gene module prediction and assessment of clinical relevance. The minimum sample size to run weighted gene co-expression network analysis (WGCNA) is at least 15. Therefore, WGCNA v1.63 32 was used on pooled ST and LT PDAC survival patients to generate a transcriptional network from the normalized expression data. The weighted coefficient β was selected based on scale-free topology criteria. The adjacency coefficient α was computed using the power to measure the correlation strength between two genes. The adjacency matrix was created based on α, which was subsequently transformed into a topological overlap matrix (TOM). The distance measure dissTOM = 1 − TOM, served as input to perform average linkage hierarchical clustering (with DynamicTreeCut 83 ), giving rise to gene co-expression modules. Gene modules were shown as branches of the resulting pruned tree. It was followed by the calculation of module eigengenes (MEs), defined as the 1st linear principal component of each co-expression module. The hierarchical clustering of MEs was performed to study associations between modules. Approximate non-parametric association tests were used to investigate the association between MEs and PDAC clinical traits. In effect, we used two methods to identify modules related to clinical progression traits. First, within-module gene significance was identified for every module and all available clinical traits. Average gene significance for a module was defined as "module significance", following recommendations of 84 . Second, rank-based correlation (r) was performed among each ME with the multiple clinic pathological characteristics available in this study (adjusted p-value for 0.05 MEs). We used parametric (Pearson correlation coefficient) and non-parametric (Spearman rank) tests for each continuous and categorical data. In order to assess the functional relevance of clinically associated modules, we used ClueGO 85 , a Cytoscape plug-in, to visualize the non-redundant biological terms for genes in a functionally comparative network from multiple clusters. Non-redundancy was assessed via two-sided hypergeometric testing for enrichment/depletion (Bonferroni adjusted p-value < 0.05). Cytoscape 5.0 86 was used for visualizing gene interaction networks (Fig. 1C).
Bailey et al. 26 reported four subtypes in PDAC i.e. ADEX, Immunogenic, Squamous and Pancreatic Progenitor. We used the SubMap module in GenePattern (https:// www. genep attern. org/) to identify the association of studied ST and LT groups to the known PDAC subtypes 26 . Subtypes identified in Puleo et al. 25 were also used to www.nature.com/scientificreports/ identify the association with each sample. For this, a centroid-based supervised classification dataset was used and applied to each LT and ST PDAC sample from this study. Next, the correlation coefficients between each sample and the reference subtype centroid were used as a prediction score.
Individual-specific survival heterogeneity: quantification of heterogeneity between individual transcriptome profiles, with functional and clinical relevance. We used principles of the PEPPER 24 method to construct personalized gene expression perturbation profiles for each of N = 19 PDAC subjects. PEPPER requires a target class of individuals and a reference class (Fig. 1D). In this study, we took LT PDAC survivors as target group and considered ST survivors as reference (i.e., the most abundant group in real-life). The approach captures the extent to which gene i is perturbed in subject j via a Z-score. This Z-score indicates how many standard deviations the individual's gene expression is away from the mean value of the reference group. As a threshold, we used |z|= 2. Positive z-scores > 2 would indicate up-regulation, negative z-scores < -2 would indicate down-regulation. Given the small sample sizes to work with in this study, we reshuffled the ST/LT group labels 87  In the aforementioned PEEPs analyses (PEEP: an individual perturbation expression profile against a reference), no notion of gene-connectivity was used. However, gene connectivity via reference networks can further highlight interesting gene clusters linked to LT survivors. Here, we considered physical interaction data as available from ConsensusPathDB 91 , and obtained 373,101 links between N = 19,117 genes. Starting with genes in pathways that already have been implied in PDAC via 68 , and supplementing these genes with searches in the DisGeNet database 34 (search term = "Pancreatic Diseases"), resulted in 53 seed genes (Fig. 1I). We then used DADA's module detection algorithm 6 to augment the initial list of 53 seed genes and identify PDAC disease modules. The top 1 percent highest-ranked genes were considered to form a disease module. Significantly perturbed genes (in LT survivor PEEPs) were mapped on the identified disease module. This allowed putting LT survival individual specific genes in the context of gene connectivity and gene neighborhoods. All DADA top 1 percent genes were checked for retrieval in previous analyses (Fig. 1J,L). As an alternative approach to exploit gene interaction network structure, we adapted NetICS 92 , an approach initially intended to prioritize cancer genes on a directed functional interaction network. It uses an individual-specific list of genes via bidirectional network diffusion of two layers of information (Fig. 1E). As the first layer, we took the individual-specific significant genes as highlighted in the LT PDAC survival PEEPs analyses before (instead of mutant genes per sample in the original NetICS implementation). As second layer we took groups specific DEGs. Individual-specific gene ranks (for LT survivors) were aggregated via NetICS methodology into an overall ranked list of genes, with restart probability of 0.4. The top 1% percent ranked genes were retained. Similar to follow-up of DADA top-ranked genes, we checked for the frequency of NetICS derived top-ranked genes that were also retrieved in former analyses (Fig. 1F-H).

Conclusion
In this study, we performed a series of transcriptome computational analyses to better understand PDAC survival heterogeneity. To our knowledge, we demonstrated and applied for the first time in PDAC samples an integrative analytic workflow, combining clinical and omics data, and individual-level and group-level transcriptome profiling. In addition, we showed the utility of network-based approaches, disease modules and multi-scale functional analyses (gene, protein domain, pathway), that led to the identification of known oncogenes and genes with promising therapeutic potential, as well as genes that highlighted gene-level heterogeneity among long-term PDAC survivors. From both the group and individual level analysis, we found various gene targets and their role in immune specific pathways in PDAC survival mechanism. Hence, all the analysis confirms the role of immune specific pathways as potential therapeutic targets for PDAC survival.

Softwares used
All analyses have been conducted according to software packages discussed in the method section. We have utilized the following software packages in our present study DESeq2 78 (differential analysis at group level), WGCNA 32 (module detection in gene expression data), PEPPER 24 (differential analysis at individual level), ClusterProfiler 80 v3.8 (functional analysis of differential genes), survminer 93

Data availability
Data deposited in GEO with accession number GSE150043.